lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers
نویسندگان
چکیده
Abstract: We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric outliers. We study the recovery of the global l0 subspace (i.e., with largest number of points) by minimizing the lp-averaged distances of data points from d-dimensional subspaces of R , where 0 < p ∈ R. Unlike other lp minimization problems, this minimization is non-convex for all p > 0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then the global l0 subspace can be recovered by lp minimization with overwhelming probability (which depends on the generating distribution and its parameters). Moreover, when adding homoscedastic noise around the underlying subspaces, then with overwhelming probability the generalized l0 subspace (with largest number of points “around it”) can be nearly recovered by lp minimization with an error proportional to the noise level. On the other hand, if p > 1 and there is more than one underlying subspace, then with overwhelming probability the global l0 subspace cannot be recovered and the generalized one cannot even be nearly recovered.
منابع مشابه
Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation
In this work, we address the following matrix recovery problem: suppose we are given a set of data points containing two parts, one part consists of samples drawn from a union of multiple subspaces and the other part consists of outliers. We do not know which data points are outliers, or how many outliers there are. The rank and number of the subspaces are unknown either. Can we detect the outl...
متن کاملA Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...
متن کاملRobust Subspace Outlier Detection in High Dimensional Space
Rare data in a large-scale database are called outliers that reveal significant information in the real world. The subspace-based outlier detection is regarded as a feasible approach in very high dimensional space. However, the outliers found in subspaces are only part of the true outliers in high dimensional space, indeed. The outliers hidden in normalclustered points are sometimes neglected i...
متن کاملA Geometric Analysis of Subspace Clustering with Outliers
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspac...
متن کاملIsotropic Constant Dimension Subspace Codes
In network code setting, a constant dimension code is a set of k-dimensional subspaces of F nq . If F_q n is a nondegenerated symlectic vector space with bilinear form f, an isotropic subspace U of F n q is a subspace that for all x, y ∈ U, f(x, y) = 0. We introduce isotropic subspace codes simply as a set of isotropic subspaces and show how the isotropic property use in decoding process, then...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1012.4116 شماره
صفحات -
تاریخ انتشار 2010